-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SL: Debugging _adapt_offline() method #746
Conversation
Hi @laxmikantbaheti , thank you once again for noticing the mistakes in MLPro-SL and explaining them in #744. I have updated the sampling and _adapt_offiline methods accordingly. Could you review this and let me know if something is still not correct? Otherwise, I will merge this into the main branch. |
Hi @steveyuwono , thanks for making the changes. Sure, I will review and let you know. |
Hi @steveyuwono , I think, here you are fetching data with the indexes from the sampler and store those specific data in the input and output keys of trainer and tester dictionaries. Which works, but I dont think you need to implement that entire logic, since you've already provided the sampler to the dataloader, now if you only iterate over it like an iterator, as following:
it will return a tensor of all the tensors at the given indices of sampler. however, current implementation, would still only return first batch of data. In case of howto_mbrl_gridworld or defualt parameters, this is ok, as the batch size is equal to the buffer size and thus the number of batches within it is only one. However, in cases where buffer is larger and the batch size is smaller than half the buffer size, there would be more than one batch. In which case the adapt offline method shall iterate over all the batches in the buffer, optimize after each batch forward and then end. So, I suggest, the sampling method can be left as it was previously, and the in the _adapt_offline() method of the previous version, insead of doing `
How about: `
This might be one of the option. |
Hi @laxmikantbaheti, previously, I was using trainer["input"] = next(iter(trainer_loader_input)), but then I found out that the order of the indices is random. Thus, the trainer["input"] and trainer["output"] can have different orders and are not relevant once we calculate the prediction losses by the network. Therefore, I fix the batch in the sampling method. For instance, in the first sampling, the set train_indices is: However, I just found out that this issue can be solved through torch.manual_seed(). Hence, it now can be done in the _adapt_offline. I also add p_num_iter as an input of _adapt_offline. Please check it once again and let me know. Thank you! |
Ohh, now I understand. just one thing, if we iter the dataloader at each call, it will reset and give only the first batch. And the num_iterations is provided externally. I think it shall be different. Let's say you have a buffer of size 100 and batch size of 20, thus 5 different batches. Each time next shall give a next batch of 20. And then the adapt_offline shall end when all the batches are over. |
Thanks very much again for all the changes!! |
Hi @laxmikantbaheti , sorry I did not notice your message yesterday. I fixed the iterations in _adapt_offline according to its batch size. I believe this works properly at the moment. Shall I merge this into the main? Also thank you anyway for your review |
Hi @steveyuwono , thank you. It works perfectly. |
Description
Background
Checklists: